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Abstract 

The  well-known  nonparametric  maximum  likelihood  estimate  of  a  distribution  function  Fq 
is  the  empirical  c.d.f.  F.  Point  estimates  for  many  statistical  functionals  T(F)  are  taken  to  be 
T(F).  This  article  explores  interval  estimation  of  T(Fq)  by  (inf^(j?j>c  T(JP), sup^j?^,. T(F)), 
where  R  is  a  nonparametric  likelihood  ratio  function  for  distributions.  For  the  mean,  under 
regularity  conditions,  it  is  shown  that  the  interval  has  asymptotic  coverage  a  where  —2  logc  is 
a)>  the  a  quantile  of  the  chisquare  distribution  on  1  degree  of  freedom.  Thus,  the  theorem 
of  Wilks  on  the  asymptotic  distribution  of  the  likelihood  ratio  has  a  nonparametric  analog. 
From  the  mean  the  result  extends  to  a  family  of  M-estimators.  The  n.l.r.  intervals  are  related 
to  the  bootstrap  confidence  intervals  based  on  nonparametric  tilting  of  Efron  (1981,  section 
ii). 


*  Work  supported  by  an  Office  of  Naval  Research,  contract  N00014-83-K-0472. 
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1.  Introduction 


Suppose  the  random  variables  are  an  i.i.d.  sample  from  a  distribution  fo- 

The  well-known  non-parametric  maximum  likelihood  estimate  of  Fq  is  F  —  -  >  where 

Sx{  is  the  distribution  function  of  a  point-mass  at  X{. 

The  likelihood  function  evaluated  at  a  distribution  F  with  mass  >  0  on  X{  is 

and  this  is  easily  seen  to  be  maximized  at  F.  In  this  article  the  function  L  is  restricted  to 
distributions  with  support  only  on  the  observed  sample.  That  is,  it  is  assumed  that  ^2  —  1. 

The  likelihood  ratio  function 


R(F)  =  L(F)/L{F)  =  U?=1nwi 

is  defined  on  the  same  simplex  as  L.  Throughout  this  paper  F  will  be  a  generic  member  of 
the  simplex  identified  with  the  generic  weights  u;,-, t  =  1, . . . ,  n. 

Suppose  we  are  interested  in  estimating  the  functional  T,  defined  on  the  simplex  and 
on  Fo>  The  nonparametric  m.l.e.  of  T  is  T(F).  If  T  is  continuous  on  the  simplex,  as  many 
statistical  functionals  of  practical  interest  are,  then  the  image  of  T(-)  over  the  part  of  the 
simplex  in  which  R(F)  >  c  is  an  interval  by  the  compactness  of  that  subset.  The  smaller  c  is, 
the  larger  the  interval  is,  and  the  family  of  intervals  is  nested.  The  idea  of  this  paper  is  to  use 
the  intervals  so  obtained  as  a  family  of  confidence  intervals  for  T.  It  should  be  mentioned  that 
the  region  R  >  c  is  convex,  since  this  fact  could  make  the  numerical  optimization  of  T  easier. 

For  fairly  general  parametric  situations,  Wilks  (1938)  showed  that  —  21og(U)  has  an 
asymptotic  chisquare  distribution,  where  R  is  the  ratio  of  the  likelihood  evaluated  at  the 
true  parameter  to  the  maximum  of  the  likelihood  function.  This  fact  can  be  used  to  form 
confidence  intervals  for  the  parameter  that  have  asymptotically  correct  coverage.  The  errors 
are  of  order  1/y/n.  In  some  situations  small  sample  distributions  of  the  likelihood  ratio  are 
available.  Thus  in  the  parametric  likelihood  ratio  interval  problem  there  is  a  reasonable  way 
to  chose  c. 

In  the  completely  nonparametric  situation  we  cannot  possibly  be  so  lucky.  The  distribu- 
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tion  theory  for  a  likelihood  ratio  when  the  number  of  parameters  is  tending  to  infinity  at  the 
same  rate  as  the  number  of  observations  will  be  different.  In  particular,  if  the  true  distribution 
is  continuous  it  will  always  have  a  0  likelihood  ratio.  If  the  functional  T(F)  is  badly  behaved, 
as  is  for  example  an  indicator  that  is  1  iff  F  is  continuous,  or  if  T  is  the  mean  and  Fq  is  Cauchy, 
then  this  approach  cannot  possibly  work.  Fortunately,  many  statistical  applications  are  much 
less  perverse,  and  the  nonparametric  likelihood  ratio  method  is  not  degenerate.  It  is  usually 
the  case  that  one  can  get  T  right  without  getting  Fo  right. 

This  article  focuses  attention  on  the  mean  although  extensions  to  certain  M-estimates 
follow  almost  immediately. 

In  the  next  section,  the  nonparametric  likelihood  ratio  interval  estimates  for  the  mean 
are  derived.  Section  3  computes  the  asymptotic  probability  of  covering  the  true  mean  by  such 
an  interval.  Theorem  1  in  that  section  shows  that  the  asymptotic  coverage  of  the  mean  can  be 
obtained  from  the  distribution  in  the  same  way  as  in  regular  one-dimensional  parametric 
families  with  parametric  likelihoods.  In  particular  it  follows  that  asymptotic  non-parametric 
likelihood  ratio  tests  for  the  mean  can  be  obtained.  In  section  4,  the  n.l.r.  intervals  are  extended 
to  a  class  of  M-estimates.  Section  5  reconsiders  the  restriction  to  distributions  supported  on 
the  sample. 

A  similar  approach  was  used  by  Thomas  and  Grunkemeier  (1975)  to  get  confidence  inter¬ 
vals  for  the  survival  function  in  the  presence  of  censoring.  They  provide  a  heuristic  argument 
for  the  asymptotic  chisquare  distribution  of  their  estimator.  Their  situation  differs  in  that  the 
extrema  of  R  over  all  distributions  are  usually  equivalent  to  the  extrema  over  distributions 
with  support  on  the  observed  (failure)  times.  The  only  exception  is  the  somewhat  degenerate 
case  where  one  seeks  an  interval  estimate  of  the  probability  of  survival  past  a  time  b  that  does 
not  lie  between  the  smallest  and  largest  observed  failure  time. 

The  motivation  here  wets  to  find  a  way  to  compute  nonparametric  confidence  intervals 
without  having  to  know  which  bootstrap  technique  to  choose.  That  problem  can  be  acute  when, 
for  example,  the  bias-corrected  percentile  method  and  the  percentile  method  choose  intervals 
skewed  in  opposing  directions.  (Recent  bootstrap  confidence  interval  methods  may  overcome 
this  difficulty.  See  Efron  (1985).)  In  many  parametric  situations  likelihood  ratio  intervals 
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automatically  get  the  right  skewness,  and  the  thought  was  that  nonparametric  likelihood  ratios 
might  also. 

Ironically  the  resulting  technique  is  similar  to  another  bootstrap  confidence  interval  tech¬ 
nique,  namely  the  nonparametric  tilting  bootstrap  confidence  intervals  of  Efron  (1981, section 
11).  There  are  two  differences.  One  difference  is  that  the  family  of  intervals  is  based  on  weights 
(11.8)  instead  of  (11.1)  of  Efron  (1981).  The  other  difference  is  that  the  chosen  member  of  the 
family  of  intervals  is  taken  from  the  asymptotic  chisquare  result  of  section  3  rather  than  from 
resampling  from  tilted  distributions.  For  the  mean  this  could  be  quite  a  large  computational 
saving.  For  more  complicated  functionals,  the  numerical  optimization  could  be  more  work 
than  resampling. 

2.  N.l.r.  interval  estimates  of  the  mean 

The  mean  is  written  as  a  functional  as 


To  obtain  the  n.l.r.  interval  corresponding  to  c  one  must  maximize  and  minimize  M(F)  subject 
to  the  constraints  ^ tuf-  =  1,  Ylwi  >  c  &nd  u/t-  >  0.  In  practice  it  suffices  to  use  the  simpler 
constraint  Y[  wi  —  c-  For  most  functionals  of  interest  the  product  constraint  is  binding.  Even 
if  it  is  not,  a  typical  application  involves  calculating  the  bounds  for  many  c  values  and  one 
could  use  the  equality  constrained  extrema  to  compute  the  inequality  constrained  values. 

Using  Lagrange  multipliers,  let 

G  =  52wix*  +  *i(l  ~  Z)u,«)  +  ^2 (log  c  -  X)log(nu;t)). 

Setting  the  partial  derivatives  of  G  with  respect  to  wt-  to  zero  yields 

A2 

’  Xi  - 

A2  can  be  obtained  as  a  normalizing  constant.  For  a  given  set  of  weights  the  maximum  of  M 
is  obtained  when  the  largest  weights  are  attached  to  the  largest  X,*,  and  the  minimum  obtains 
when  the  weights  decrease  with  Xt*.  Therefore  Ai  cannot  be  in  the  interval  (X^jjX^),  for 
then  the  would  not  be  monotone.  This  also  guarantees  that  they  must  all  be  of  the  same 
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sign,  that  is,  they  must  all  be  positive.  For  the  minimum  Ml°  =  X)  tu^-Xi  where 

w  -  jc(1)  -  i-r1 
‘  E  ,(*,•  -  x(1)  -  ,'»)-■ 

and  for  the  maximum  Mup  =  where 

w.i  -*.+ 1"')-1 

‘  Ejl-^w  -Xj  +  tjw)-1 

for  appropriate  t)l°  and  r)up. 

The  17  values  can  be  found  by  a  one  dimensional  zero-finding  algorithm  such  as  Newton’s 
method.  For  c  near  1,  the  17  values  are  very  large  said  positive  and  the  weights  are  nearly  equal. 
As  c  decreases  the  r]  values  approach  0,  and  all  the  weight  goes  to  the  appropriate  extreme 
observation. 

The  limiting  values  of  the  bounds  as  c  approaches  0  are  and  X(n),  the  largest  and 
smallest  observed  values.  This  means  that  no  n.l.r.  confidence  interval  for  the  mean  will  go 
outside  the  observed  range  of  the  data.  This  stands  in  contrast  to  parametric  confidence 
intervals  which  may  easily  go  outside  the  observed  range.  The  very  extreme  confidence  limits 
in  the  nonparametric  case  are  obviously  going  to  be  suspect.  It  would  be  interesting  to  learn 
how  close  to  the  edge  one  can  safely  go.  In  the  parametric  case  the  extreme  confidence  limits 
will  depend  heavily  on  difficult  to  verify  assumptions  about  the  tails  of  the  distributions  in  the 
model  and  may  therefore  be  equally  dubious. 

Instead  of  calculating  the  17  values  for  a  prescribed  c,  it  is  more  convenient  to  calculate 
c,  Mup  and  Mio  over  a  grid  of  >73.  This  allows  the  plotting  of  the  log  likelihood  ratio  as  a 
function  of  x,  and  the  limits  for  any  desired  c  can  be  obtained  via  interpolation. 

The  following  observations  were  generated  from  the  standard  normal  distribution: 

-2.2  -  1.6  -  1.2  -  1.0  -0.2  -0.0  0.2  0.3  1.0  1.2, 

the  mean  of  the  observations  is  -0.34  and  the  mean  square  is  1.24. 

The  nonparametric  log  likelihood  ratio  function  for  the  mean  based  on  these  points  is 
plotted  in  figure  1  as  a  solid  line.  The  unconnected  dots  represent  the  true  log  likelihood  ratio 
function  under  the  assumption  that  the  sample  is  from  1). 
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Figure  1  Parametric  and  Nonparametric 
Likelihood  Ratio  Functions  for  a  Normal  Mean 


.5  -2.0 


-1.5  -1.0  -0.5 


0.0 


0.5 


1.0 


6 


Figure  2  Parametric  and  Nonparametric 
Likelihood  Ratio  Functions  for  a  Normal  Mean  Square 


Their  squares  are  distributed  as  X(i)>  and  the  nonparametric  log  likelihood  ratio  function 
for  the  mean  square  is  plotted  in  figure  2  as  a  solid  curve.  The  unconnected  dots  in  figure  2 
represent  the  true  log  likelihood  ratio  function  under  the  assumption  that  the  sample  is  from 
N{0,tr*). 

The  nonparametric  log  likelihood  ratio  functions  look  surprisingly  like  the  parametric 
ones,  considering  that  only  10  points  are  available. 

In  figure  1  the  n.l.r.  interval  endpoints  were  calculated  for  t]  values  equal  to  2J  (X(n) — 
for  j  =  —5, . . . ,  10.  In  figure  2  the  largest  and  smallest  squared  observations  were  used  in  the 
V  b. 

The  median  also  has  a  quite  tractable  algorithm  for  determining  n.l.r.  interval  estimates, 
as  do  other  quantiles  but  these  reduce  to  certain  quantities  based  on  the  binomial  distribution. 

3.  Asymptotics  for  N.l.r.  Estimation  of  the  Mean 

In  this  section  it  is  proved  that  if  Fo  has  bounded  support  and  is  not  degenerate  that 
the  n.l.r.  interval  for  the  mean  of  Fo  has  asymptotic  coverage  a  when  — 21ogc  =  X(ia)>  ^e 
a  quantile  of  the  chisquare  distribution  on  1  degree  of  freedom.  The  author  conjectures  that 
the  bounded  support  condition  can  be  replaced  by  existence  of  some  moment  higher  than  the 
first. 

Theorem  1 

Let  Xi  €  [— Af,  M]  be  i.i.d.  from  a  nondegenerate  distribution  Fo.  If 

Xu  =  sup  XL  =  inf  Y%=  i  wi xi 

where  both  extrema  are  taken  over  R{F)  =  nw*  ^  cj  then 

P{XL  <  E{X x)  <  Xu)  -  P{x2W  <  —2  log  c). 


Proof:  Without  loss  of  generality  -E(-Xi)  =  0.  <  0,X(n)  >  0)  — ►  1,  since  Fq  is  not 

degenerate,  so  we  can  assume  that  there  is  at  least  one  observation  of  each  sign.  It  follows  that 

R°  =  sup{  n  nu>i  |  5Z  wi  —  1>  Z)  w*Xi  =  0  } 
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exists. 


Now  Xl  <  0  <  Xu  iff  c  <  f2°.  We  show  that  —2  log  i?°  has  an  asymptotic  chisquare 
distribution. 

To  find  R°y  let 

G  =  £  log(n«;t)  +  7(1  ~  E  wi)  +  nM°  “  E  wiXi)- 


Setting  dG / dwt-  =  0  one  obtains 


Wi  = 


1 


7  +  nXXi 

and  summing  w,-dG/dwt-  shows  that  7  =  n.  It  follows  that 


l°g(*°)  =  Elog  =  -  Elog(l  +  a °Xi) 

where  A°  satisfies 

0  =  “  E  =  s(A°).  (1) 

The  function  <7  (A)  defined  by  (1)  is  finite  in  an  interval  containing  the  origin.  Since 
<j'(A)  =  ~n  S£si(i+AJ>t.,)a  <  0  (recall  that  not  all  Xi  =  0),  g  is  strictly  decreasing  in  the 
interval  and  equation  (1)  is  easily  seen  to  have  a  unique  solution  in  the  interval  (— X^,  — 

Note  also  that  $(0)  =  X ,  the  sample  mean. 

A°  as  defined  by  (1)  is  an  M-estimate,  although  not  one  of  location,  with  ip(X,  A)  = 
X/(l  +  XA).  The  M-estimate  applied  to  Fq  is  0  since  Fq  has  mean  zero.  Since  E(ip(Xy  A))  is 
positive  when  A  <  0  and  negative  when  A  >  0  it  follows  by  proposition  3.2.1  of  Huber  (1981) 
that  A°  — ►  0  strongly. 

Expand  h  =  g~l  in  a  Taylor’s  series  about  X,  and  evaluate  it  at  0  to  get 
h{ 0)  =  A°  =  h(X)  +  (0  -  X)h'(x)  +  (0  -  X)2h"(0 


where  £  is  between  0  and  X.  Now  h'{X)  =  l/g'(0)  and  h"(£)  =  —g"(jf)/g'{j})Z  where  7  =  h(£) 
is  between  0  and  A°.  Therefore 


Y  2 


+  2X2 


Etrfel3 
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Let  r°  denote  the  second  term  in  A°.  A  simple  bound  is 


,  -*2/nEWl(l +J2M! 

-  (l/nEXfHl-llW 


Since  |rj|  <  |A°|  -+0  a.s.  we  can  assume  that  |»7|  is  a  small  fraction  of  1/M.  By  the  strong  law 
of  large  numbers  applied  to  both  |X,*|3  and  X2,  the  probability  is  arbitrarily  close  to  1  that 
|r°|  <  ZX2 n~2 E(\Xi\z) / E(Xl)z  =  Op(n“3).  By  the  central  limit  theorem  >/n(A°  —  r°)  tends 
to  the  standard  normal  distribution  and  so  A°  =  Op(n“1/2). 


Now  P(max  |A°X,-|  >  .5)  — ►  0  so  we  may  use  the  same  Taylor  expansion  of  log(l  +  A°Xt) 
for  all  sufficiently  large  n: 

log(iJ°)  =  -Elog(l  +  A0XtO 


=  -  Z)(A°X<  -  §(A°-X»)2  +  »7«)  where  |»j,|  <  |A°Xt|s/3 


=  -nX\°  +  ^X*\°2  -  'Em 

=  -»X(=  +  O  +f^(=  +  r”)2-E  IK 


— n===  —  r  °nX  +  ^  ==  +  renX  +  ^ro2X*  —  E*li 
X2  2  X2  2  ^  ' 


nX*  n 
2  +  2r 


o2X2  +  nOp(n~ 


3/2) 


=  +  <*(1))  +  £°,(»-e)°,(i)  +  °,(  I.-1'2) 


so  —2  log  R°  — >  X(i)  in  distribution  as  required.! 

The  error  in  the  approximation  is  of  order  1  / y/n  since  the  central  limit  theorem  operates 
on  the  leading  term  at  that  rate  and  the  other  term  is  of  that  rate.  This  is  the  same  rate 
obtained  by  Wilks  (1938)  for  the  parametric  case. 


4.  N.l.r.  intervals  for  M-estimates 

The  estimation  of  the  n.l.r.  confidence  intervals  for  M-estimates  is  similar  to  that  for  the 
mean  and  the  asymptotic  chisquare  distribution  carries  over  too. 

Let  T  be  a  statistical  functional  defined  by 


0  =  H>{X,T)dF(X) 
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where  i}){X9t)  is  nondecreasing  in  X9  continuous  and  nonincreasing  in  r,  and  bounded.  This 
covers  most  commonly  used  robust  M-estimates  of  location;  the  main  quibble  from  a  practical 
point  of  view  is  that  the  concomitant  estimation  of  the  scale  constant  is  being  ignored. 

Let  7t(F)  =  dT{F)/d,Wiy  the  empirical  influence  of  T  at  -X*.  The  upper  bound  of  the 
n.l.r.  interval  for  T  for  some  c  is 

sup  7(7*). 

R(F)>c 

The  supremum  is  attained  at  weights  satisfying 

(WF)  -  A)"1 

*'  ~  EM*)  -  A)-1 

by  the  same  Langrangian  multiplier  method  as  was  used  in  the  case  of  the  mean.  The  infimum 
is  obtained  similarly.  As  in  the  case  of  the  mean  the  weights  can  be  rewritten 

«  (J(n)  “  Ii(F)  +  *?)-1 

for  appropriate  rj  where  is  the  influence  of  and  is  also  the  largest  of  the  influences. 

The  big  difference  here  is  that  the  Ii{F)  depend  on  the  through  F  =  ^  whereas 

for  the  mean  =  X{  is  fixed.  Therefore  even  for  a  fixed  17  the  corresponding  point  on 

the  likelihood  ratio  curve  will  have  to  be  estimated  iteratively.  Another  approach  is  to  obtain 
the  likelihood  ratio  function  for  the  M-estimate  from  that  of  the  mean  of  tf>(X ,  r)  for  a  grid  of 
values  of  r.  This  idea  underlies  the  proof  of  Theorem  2  below. 

Most  of  the  work  of  extending  the  asymptotic  distribution  from  the  mean  to  the  M- 
estimates  is  done  via  the  following  lemma. 

Lemma  1 

Let  T(F)  be  the  solution  to  0  =  J ip(XyT)dF(X)  and  let  To  =  T(fo).  Assume  that  tp{X9r)  is 
nondecreasing  in  X9  and  continuous  and  nonincreasing  in  r,  and  that  T{F)  exists  for  all  F. 
Let  rup  ==  supfl>cr(J?’),  let  Mup  =  supR>c  J2  Witp(Xi9  r0)  and  define  rl°  and  Ml°  with  in&ma 
replacing  suprema.  Then  To  G  (t1°,  rup)  implies  0  G  [Ml° yMup]  and  0  G  (Ml°,  Mup)  implies 
T0e{Tl*,T*P). 

Proof:  Suppose  tq  <  Tup.  Then  each  tf>(Xi,  ro)  >  $(Xi9  rup)  and  hence  sup#>c  ^2 1 Viip(Xi9  ro)  > 
sup*>c  X)  w%t}>{Xi,Tup)  i.e.  Mup  >  8\ipR>e52wiip(Xi,Tup).  This  supremum  is  nonnegative 
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since  otherwise  there  would  be  weights  wf  satisfying  the  likelihood  ratio  constraint  for  which 
S  rup)  <  0.  But  then  r*  =  exists  by  the  assumption  on  T  and  r*  >  rup  by 

the  continuity  assumption,  contradicting  the  definition  of  rttp.  Therefore  Mup  >  0.  A  similar 
argument  shows  that  to  >  rl°  =>*  Ml°  <  0,  and  so  the  first  claim  is  proved. 

Suppose  Mup  >  0.  Then  there  are  weights  w?  satisfying  the  likelihood  ratio  constraint 
for  which  wt ro)  >  0.  As  above  there  exists  a  t*  >  To  for  which  t9)  =  0  and 

the  definition  of  rup  implies  that  Tup  >  t*  >  ro.  Similarly  Ml°  <  0  =>  t1°  <  ro.  ■ 

Now  we  can  extend  the  asymptotic  chisquare  distribution  to  M-estimates  via 
Theorem  2 

Let  T(F)  be  the  solution  of  J  ip(X,  T)dF(X)  —  0  be  an  M-esthn ate  satisfying  the  conditions  of 
Lemma  J.  Suppose  also  that  tp  is  bounded .  Let  X{  be  an  i.i.d.  sample  of  size  n  from  Fq  with 
t0  =  T(F0).  If -2 log c  =  X(liCt)  then  P(mfR>eT(F)  <  r0  <  supB>eT(f))  a. 

Proof:  By  lemma  1 

^(inf  y2  r0)  <  0  <  supV't^p^ro)) 

R>e  R>c 

<P(inf  T(F)  <  to  <  supT(P)) 

R>e  R>e 

E^(X.>ro)  <  0  <  supE^t'V’pft.To)) 

R^C  R>c  — 

and  by  Theorem  1  both  of  the  bounding  probabilities  tend  to  a.  ■ 

5.  Pinal  remarks 

The  restriction  to  distribution  functions  with  support  on  the  observations  is  unnatural  in 
a  likelihood  setting.  We  certainly  do  not  believe  that  the  true  distribution  is  in  the  simplex 
determined  by  the  sample.  That  n.l.r.  intervals  can  work  is  just  a  sampling  property  of  the 
random  simplices  we  observe. 

For  the  mean,  the  extrema  that  define  the  n.l.r.  interval  equal  the  extrema  over  all  dis¬ 
tributions  with  support  on  the  closed  interval  -X^(n)]«  This  is  easy  to  see,  since  if  a 

distribution  puts  any  probability  on  a  set  in  the  interval  that  contains  no  observations  that 
probability  can  be  “swept”  to  one  or  the  other  end  producing  a  more  extreme  mean  with  no 
reduction  in  the  likelihood  ratio.  Thus  the  extrema  can  be  said  to  be  taken  over  an  uncount- 
ably  infinite  family  of  distributions.  Without  the  restriction  on  the  endpoints  of  the  support, 
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weight  €  can  be  put  on  a  large  unobserved  X  and  give  rise  to  an  arbitrarily  large  mean  with  a 
likelihood  ratio  arbitrarily  close  to  1,  and  every  n.l.r.  interval  becomes  the  whole  real  line. 

The  situation  is  different  for  M-estimators  T  with  monotone  ip  function  and  bounded 
influence  (including  the  mean  if  we  know  a  bound  for  X).  Then  it  is  easy  to  show  that  the 
constrained  extrema  of  T  over  all  distributions  are  not  much  different  than  those  over  the 
simplex  and  that  the  difference  tends  to  zero  as  n  — ►  oo.  A  sketch  of  the  proof  is  as  follows. 
Consider  the  upper  bound  of  the  confidence  interval.  Any  weight  off  the  simplex  might  as  well 
be  put  at  oo  (or  at  a  known  bound  for  X),  so  that  the  corresponding  ip(oo,  r)  equals  the  bound 
on  xp.  The  amount  of  weight  w*  that  can  be  put  at  oo  while  preserving  R  >  c  >  0  is  bounded 
by  1  —  c1/"  +  c1/2/n  — >  0.  View  this  as  contamination.  The  supremum  of  the  contaminated 
M-estimate  subject  to  the  likelihood  ratio  condition  is  less  than  the  uncontaminated  supremum 
plus  the  greatest  difference  that  the  contamination  could  make  at  any  underlying  distribution. 
This  latter  quantity  is  0(w* J3)  where  B  is  the  bound  on  the  influence  function  of  T.  It  follows 
that  n.l.r.  intervals  for  T  in  which  the  limits  are  taken  over  all  distributions  on  the  line  will 
have  the  same  asymptotic  coverage  as  those  based  on  distributions  supported  on  the  sample. 

Most  bootstrap  confidence  interval  techniques  also  work  exclusively  with  distributions 
supported  on  the  sample.  An  exception  is  Hjort  (1985)  who  constructs  a  Bayesian  bootstrap 
based  on  a  Dirichlet  prior. 
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